Goto

Collaborating Authors

 Stark County


Geological Inference from Textual Data using Word Embeddings

Linphrachaya, Nanmanas, Gómez-Méndez, Irving, Siripatana, Adil

arXiv.org Artificial Intelligence

This research explores the use of Natural Language Processing (NLP) techniques to locate geological resources, with a specific focus on industrial minerals. By using word embeddings trained with the GloVe model, we extract semantic relationships between target keywords and a corpus of geological texts. The text is filtered to retain only words with geographical significance, such as city names, which are then ranked by their cosine similarity to the target keyword. Dimensional reduction techniques, including Principal Component Analysis (PCA), Autoencoder, Variational Autoencoder (VAE), and VAE with Long Short-Term Memory (VAE-LSTM), are applied to enhance feature extraction and improve the accuracy of semantic relations. For benchmarking, we calculate the proximity between the ten cities most semantically related to the target keyword and identified mine locations using the haversine equation. The results demonstrate that combining NLP with dimensional reduction techniques provides meaningful insights into the spatial distribution of natural resources. Although the result shows to be in the same region as the supposed location, the accuracy has room for improvement.


Data Science Education in Undergraduate Physics: Lessons Learned from a Community of Practice

Shah, Karan, Butler, Julie, Knaub, Alexis, Zenginoğlu, Anıl, Ratcliff, William, Soltanieh-ha, Mohammad

arXiv.org Artificial Intelligence

It is becoming increasingly important that physics educators equip their students with the skills to work with data effectively. However, many educators may lack the necessary training and expertise in data science to teach these skills. To address this gap, we created the Data Science Education Community of Practice (DSECOP), bringing together graduate students and physics educators from different institutions and backgrounds to share best practices and lessons learned from integrating data science into undergraduate physics education. In this article we present insights and experiences from this community of practice, highlighting key strategies and challenges in incorporating data science into the introductory physics curriculum. Our goal is to provide guidance and inspiration to educators who seek to integrate data science into their teaching, helping to prepare the next generation of physicists for a data-driven world.


AmbigDocs: Reasoning across Documents on Different Entities under the Same Name

Lee, Yoonsang, Ye, Xi, Choi, Eunsol

arXiv.org Artificial Intelligence

Different entities with the same name can be difficult to distinguish. Handling confusing entity mentions is a crucial skill for language models (LMs). For example, given the question "Where was Michael Jordan educated?" and a set of documents discussing different people named Michael Jordan, can LMs distinguish entity mentions to generate a cohesive answer to the question? To test this ability, we introduce a new benchmark, AmbigDocs. By leveraging Wikipedia's disambiguation pages, we identify a set of documents, belonging to different entities who share an ambiguous name. From these documents, we generate questions containing an ambiguous name and their corresponding sets of answers. Our analysis reveals that current state-of-the-art models often yield ambiguous answers or incorrectly merge information belonging to different entities. We establish an ontology categorizing four types of incomplete answers and automatic evaluation metrics to identify such categories. We lay the foundation for future work on reasoning across multiple documents with ambiguous entities.


Neural Approaches to Entity-Centric Information Extraction

Zaporojets, Klim

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) has huge impact on our daily lives with applications such as voice assistants, facial recognition, chatbots, autonomously driving cars, etc. Natural Language Processing (NLP) is a cross-discipline of AI and Linguistics, dedicated to study the understanding of the text. This is a very challenging area due to unstructured nature of the language, with many ambiguous and corner cases. In this thesis we address a very specific area of NLP that involves the understanding of entities (e.g., names of people, organizations, locations) in text. First, we introduce a radically different, entity-centric view of the information in text. We argue that instead of using individual mentions in text to understand their meaning, we should build applications that would work in terms of entity concepts. Next, we present a more detailed model on how the entity-centric approach can be used for the entity linking task. In our work, we show that this task can be improved by considering performing entity linking at the coreference cluster level rather than each of the mentions individually. In our next work, we further study how information from Knowledge Base entities can be integrated into text. Finally, we analyze the evolution of the entities from the evolving temporal perspective.


Machine Learning Technique Predicting Video Streaming Views to Reduce Cost of Cloud Services

Darwich, Mahmoud

arXiv.org Artificial Intelligence

Video streams tremendously occupied the highest portion of online traffic. Multiple versions of a video are created to fit the user's device specifications. In cloud storage, Keeping all versions of frequently accessed video streams in the repository for the long term imposes a significant cost paid by video streaming providers. Generally, the popularity of a video changes each period of time, which means the number of views received by a video could be dropped, thus, the video must be deleted from the repository. Therefore, in this paper, we develop a method that predicts the popularity of each video stream in the repository in the next period. On the other hand, we propose an algorithm that utilizes the predicted popularity of a video to compute the storage cost, and then it decides whether the video will be kept or deleted from the cloud repository. The experiment results show a cost reduction of the cloud services by 15% compared to keeping all video streams.


Tesla Autopilot head Andrej Karpathy leaves as company faces renewed crash probes

Daily Mail - Science & tech

Tesla Director of Artificial Intelligence and Autopilot Andrej Karpathy is leaving the company at a critical time - as it faces renewed probes over crashes and growing scrutiny. Tesla's head of artificial intelligence and autopilot Andrej Karpathy, pictured above at a conference, is leaving the company at a critical time'It's been a great pleasure to help Tesla towards its goals over the last 5 years and a difficult decision to part ways. In that time, Autopilot graduated from lane keeping to city streets and I look forward to seeing the exceptionally strong Autopilot team continue that momentum,' he wrote on Twitter, noting that he has no plans for what's next. Tesla CEO Elon Musk replied to thank him for his work at the company. The leadership change comes at a challenging time, as Tesla faces renewed scrutiny from US regulators over crashes involving drivers who used Autopilot and works to expand the latest version of Full Self Driving (FSD) to a larger number of customers.


Table-based Fact Verification with Salience-aware Learning

Wang, Fei, Sun, Kexuan, Pujara, Jay, Szekely, Pedro, Chen, Muhao

arXiv.org Artificial Intelligence

Tables provide valuable knowledge that can be used to verify textual statements. While a number of works have considered table-based fact verification, direct alignments of tabular data with tokens in textual statements are rarely available. Moreover, training a generalized fact verification model requires abundant labeled training data. In this paper, we propose a novel system to address these problems. Inspired by counterfactual causality, our system identifies token-level salience in the statement with probing-based salience estimation. Salience estimation allows enhanced learning of fact verification from two perspectives. From one perspective, our system conducts masked salient token prediction to enhance the model for alignment and reasoning between the table and the statement. From the other perspective, our system applies salience-aware data augmentation to generate a more diverse set of training instances by replacing non-salient terms. Experimental results on TabFact show the effective improvement by the proposed salience-aware learning techniques, leading to the new SOTA performance on the benchmark. Our code is publicly available at https://github.com/luka-group/Salience-aware-Learning .


Determining Sentencing Recommendations and Patentability Using a Machine Learning Trained Expert System

Brown, Logan, Pezewski, Reid, Straub, Jeremy

arXiv.org Artificial Intelligence

This paper presents two studies that use a machine learning expert system (MLES). One focuses on a system to advise to United States federal judges for regarding consistent federal criminal sentencing, based on both the federal sentencing guidelines and offender characteristics. The other study aims to develop a system that could prospectively assist the U.S. Patent and Trademark Office automate their patentability assessment process. Both studies use a machine learning-trained rule-fact expert system network to accept input variables for training and presentation and output a scaled variable that represents the system recommendation (e.g., the sentence length or the patentability assessment). This paper presents and compares the rule-fact networks that have been developed for these projects. It explains the decision-making process underlying the structures used for both networks and the pre-processing of data that was needed and performed. It also, through comparing the two systems, discusses how different methods can be used with the MLES system.


Leaked emails from Tesla says its 'Full Self-Driving' beta will 'remain largely unchanged'

Daily Mail - Science & tech

Elon Musk has been banging the drum for Tesla's with'Full Self-Driving' (FSD) for more than five years, but a number of leaked emails reveal the technology is far off from providing hands-free capabilities. Documents between Tesla attorneys and the California Department of Motor Vehicles (DMV) say vehicles using the firm's latest beta version, known as'Autosteer on City Streets' will not surpass Level 2 autonomy. This level of autonomy requires drivers to remain aware and control the brake, accelerator and steering - despite Musk promising'full self driving' by 2021. Attorneys for the carmaker said the FSD beta upgrade'does not make it autonomous under the DMV's definition,' along with stating the Level 2 of will'remain largely unchanged' in a full customer rollout. Elon Musk has been banging the drum for Tesla's with'Full Self-Driving' (FSD) for more than five years, but a number of leaked emails reveal the technology is far from providing hands-free capabilities'City Streets continues to firmly root the vehicle in SAE Level 2 capability and does not make it autonomous under the DMV's definition, wrote Eric Williams, Tesla associate general counsel, in a statement attached to an email with the California DMV that has been published to PlainSite.


Gaming on a Budget? Try Your Local Library

WIRED

In the immortal words of Arthur the Aardvark, "Havin' fun isn't hard, when you've got a library card!" But how much fun can you really have with a library card? Turns out, more than I expected. Libraries across America are adding video games to their collections available for checkout. Gamers with an incessant appetite for new experiences or anyone looking to play video games for free should contact their local library to see if they have a collection.